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The recently confirmed Dejean's conjecture about the threshold between avoidable and unavoidable 
powers of words gave rise to interesting and challenging problems on the structure and growth of 
threshold words. Over any finite alphabet with k > 5 letters, Pansiot words avoiding 3-repetitions 
form a regular language, which is a rather small superset of the set of all threshold words. Using cylin- 
dric and 2-dimensional words, we prove that, as k approaches infinity, the growth rates of complexity 
for these regular languages tend to the growth rate of complexity of some ternary 2-dimensional 
language. The numerical estimate of this growth rate is « 1.2421. 

Powers, integral and fractional, are the simplest and most natural repetitions in words. Any repetition 
over an arbitrary fixed alphabet is characterized by the set of all words over this alphabet, avoiding this 
repetition. The main question concerning such a set is whether it is finite or infinite. For fractional 
powers, this question is answered by Dejean's conjecture [5], which is now proved in all cases by the 
efforts of different authors, see EtHllSHTII. 

Recall that the exponent of a word w is the ratio between its length and its minimal period: 
exp(w) = |w|/per(w). If exp(w) = /3 > 1, then w is a. fractional power (p-power). It is convenient to 
treat the notion of /3-power as follows: a word w is a /3-power if exp(w) > j8 while 1)/ per(w) < /3, 
and a j3 + -power if exp(w) > /3 while (|w| — l)/per(w) < j8. As usual, /3 + is treated as a "number", 
covering /3 in the usual < order. A word is called fi-free (where /3 can be a number with plus as well) 
if it contains no /3-powers as factors. A /3-power is k-avoidable if the number of fc-ary /3-free words is 
infinite. Dejean's conjecture states that a /3 -power is ^-avoidable if and only if 

j3 > (7/4)+ and k = 3, /3 > (7/5)+ and k = 4, or /3 > (k/(k-l)) + andk = 2,k> 5. 

The (k/(k— l)) + -free languages over ^-letter alphabets, where k > 5, are called threshold languages; we 
denote them by 7^. We study structure and growth of these languages, aiming at the asymptotic properties 
as the size of the alphabet increases. 

Any threshold language can be approximated from above by a series of regular languages con- 
sisting of words that locally satisfy the (k/(k— l)) + -freeness property. Namely, these words avoid all 
(k/(k— l)) + -powers w such that |w| — per(w) < m, for some constant m. From our previous work lfl2l . 
it is clear that the case m = 3 gives a lot of important structural information about the languages 7^. 
Here we study this case in details, using cylindric representation that captures the properties common 
for considered words over all alphabets. 

1 Preliminaries 

We study finite words and two-sided infinite words (Z-words) over finite ^-letter alphabets and over 
some special ternary alphabet introduced below. We also consider 2-dimensional words, which are just 
finite rectangular arrays of alphabetic symbols. Unlike to some commonly used models of 2-dimensional 
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words (cf. El), we do not use additional symbols to mark the borders of such a word. Factors of 2- 
dimensional words are also 2-dimensional words. 

A (1- or 2-dimensional) language is factorial, if it is closed under taking factors of its words. A word 
w avoids a word u if u is not a factor of w. The set of all minimal (with respect to the factor order) words 
avoided by all elements of a factorial language L is called the antidictionary of L. All 1 -dimensional 
languages with finite antidictionaries are regular. 

We denote the antidictionary of the threshold language 7jt by A^. A word u € Aj. can be factorized as 
u = yzy, where \yz\ = per(w), |w|/|;yz| > kj (k—\), and all proper factors of u have the exponent at most 
kj (k—l). If |.y| = m, we call u an m-repetition. 

The finite set A^ CA{ consists of all r-repetitions with r < m. The notation TS m ' is used for the 
(regular) language with the antidictionary Ai . Then, 7^ C T} m \ Since an infinite regular language 
contains arbitrary powers of some word, one has 7* C 7^ . Clearly, T% = Dm=l • 

The combinatorial complexity of a language L is a function Q,(n) which returns the number of words 
in L of length «. This function serves as a natural quantitative measure of L. "Big" ["small"] languages 
have exponential [resp., subexponential] complexity. Exponential complexity can be described by means 
of the growth rate tt{L) = limsup„_ >00 (Cz,(n)) 1 /" (subexponential complexity is indicated by tt{L) = 1). 
For factorial languages, classical Fekete's lemma implies 

a(L) = \im(C L (n)) y l n = inf {C L {n)fl n . 

n— ^oo n— >oo 

The growth rate of approximates the growth rate of from above. It is easy to prove that 
lim m ^a(7; (m) ) = a(71). 

For regular languages, the growth rate equals the index (spectral radius of the adjacency matrix) 
of recognizing automaton, providing that this automaton is consistent (each vertex belongs to some 
accepting walk), and either deterministic, or non-deterministic but unambiguous (there is at most one 
walk with the given label between two given vertices); see lfT3l . 

In iflQl . Pansiot showed how to encode all words from the language with "characteristic" words 
over the alphabet {0,1}. This encoding played a big role in the proof of Dejean's conjecture; so, we refer 

(2) 

to the elements of TV as to Pansiot words. These words can be equivalently defined by the following 
pair of conditions: 

(PI) two closest occurrences of a letter are on the distance k— 1, k, or k+l; 
(P2) two closest occurrences of a letter are followed by different letters. 

We also consider Pansiot Z-words, which are given by (PI), (P2) as well. Finite factors of Pansiot 
Z-words are exactly Pansiot words. 

Now we introduce cylindric representation of Pansiot words. Imagine such a word (finite or infinite) 
as a rope with knots, which are representing letters. This rope is wound around a cylinder such that the 
knots at distance k sue placed one under another (Fig. [H a). By (PI), the knots labeled by two closest 
occurrences of the same letter appear on two consecutive winds of the rope one under another or shifted 
by one knot (Fig. [T] b). If we connect these closest occurrences by "sticks", we get three types of such 
sticks: vertical, left-slanted, and right-slanted (Fig. [Q b). We associate each letter in a Pansiot word 
with a stick going up from the corresponding knot, getting an encoding of this word by a cylindric word 
over the ternary alphabet A = { I , / , \ }. Since the sticks allow one to establish equality of letters in a 
Pansiot word, such a cylindric word [Z-word] uniquely represents the original word [resp., Z-word] up 
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to the permutation of the alphabet. Note that cylindric words avoid squares of letters in view of (P2). 
Hence, cylindric Z-words are just infinite sequences of blocks I X and X - 

The feature of cylindric words is that they have an additional 2-dimensional structure, allowing one 
to capture structural properties of Pansiot words through 2-dimensional factors of cylindric words. We 
say that a Z-word W is compatible to a language L if all factors of W belong to L. 

Theorem 1 ( 11121 ). For any integer m > 3, there exists a set S m of 2-dimensional words of size 
0(m) x 0(m) over A such that for any k > 2m— 3, a Pansiot Z-word W over is compatible to 
T^ m ' if and only if the corresponding cylindric Z-word has no 2-dimensional factors from S m . 

This theorem states that cylindric words that encode the words from XL are defined by 2- 
dimensional avoidance properties. For example, cylindric words of the Pansiot words avoiding 3- 
repetitions are defined by the avoidance of the structures J and \ y . Indeed, any of these structures im- 
plies the existence of three successive letters (say, a,b, and c) in the encoded Pansiot word such that two 
occurrences of the factor abc appear one under another at the distance 2k; since (2k+3) /2k > kj (k— 1), 
the encoded word contains a 3 -repetition. 

For a language L, let L be its subset consisting of all factors of Z-words compatible to L. By lfl4l 
Theorem 3.1], a{L) = a(L). Let Cyl["^ be the set of all factors of cylindric Z-words encoding Pansiot 

Z-words compatible to . Then clearly a(Cyl [ m) ) = a(f/ m) ) = a(r/ m) ). Thus, the growth rates 
of threshold languages can be estimated through the study of cylindric words with simple avoidance 
properties that are independent of the size of the alphabet. In what follows, we refer to the elements of 
Cyl[ m ^ as cylindric factors. 

The above considerations imply two natural conjectures: for any fixed m > 3, the sequence 
{a(r. )}" has a limit as k approaches infinity, and this limit is the "growth rate" of the 2-dimensional 
language defined by the same avoidance properties as Cyl[ m \ Through the computations of growth rates 

for the alphabets with 5,6, ... ,60 letters we observed in lfl2l that the sequence {a(T^)} demonstrates 
fast convergence to the limit pa 1.242096777. 

In this paper, we confirm both conjectures for the case m = 3. The corresponding 2-dimensional 

language will be denoted by D; it consists of all rectangular words over A having no factors . and . . . 
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In fact, the case m = 3 is the crucial one to approximate the growth rates of threshold languages, because 
in lfl2l it was shown that 

- there is no 4- and 5 -repetitions; 

- m-repetitions with m > 6 do not affect significantly the growth rate, as far as we can check this by 
extensive computer-assisted studies based on the results of lfT3l . 



2 Two-dimensional languages 

Combinatorial complexity Ci(n,k) of a 2-dimensional language L is the function returning the number 
of n x k words in L. If L is factorial, then its growth rate is defined by the formula 

a(L)= lim (C L {n,k)) l ' nk . (1) 

n,k— »<*> 

The function Ci(n,k) in this case is submultiplicative for each variable, and hence the existence of the 
limit £[]) follows from the multivariate version of Fekete's lemma CO. 

On the other hand, it is completely unclear how to calculate the growth rates of 2-dimensional lan- 
guages. For the 1 -dimensional case, the growth rate of a regular language can be found quite efficiently, 
see |[T3l . Here we give one idea how to estimate the growth rate of a 2-dimensional language. Since 
the limit £T|) exists, we can take any "diagonal" subsequence of Ci(n,k); we choose {(C L (n,n)) 1 /"'}". 
Applying Stolz's Theorem (see [6]) twice, we get 

a(L) = li m (C i( „,„))'/.' = Um I ?(»■») .) 1/(2 *"" = lim (C^n W n.2,n- 2) r 
K 1 Ly ' n-K» \C L (n-\,n-\) J n^oo c L (n-l,n-l) 

if the last two limits exist. Calculating the values of these sequences for the language D (see Table [T]), 
we see that the last sequence has the best behaviour and allows one to suggest a(D) sa 1.2421. Thus, we 
get an additional support to the conjecture that a(D) is the limit of the sequence {a(Cyl£ 3) )}g. For the 
rest of the paper, we set C(n,k) = Co{n,k). 



Table 1 : Approximation to the growth rate of the 2-dimensional language D. 



n 
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1.280207 
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28 
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1.260626 
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30 


1.276337 


1.259362 


1.242102 
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3 Automata 

Let us fix an arbitrary k > 5. We denote the set of all words of width k from D by D^. It is natural to put 
Of(Dfc) = lim„^ C)0 (C(n,fc)) 1 /"* : ; then \im^ 00 a{D] ( ) = a{D) as the iterative limit of the existing double 
limit. Note that can be also viewed as a 1-dimensional regular language over the alphabet A k . The 
automaton srf recognizing can be defined as follows: 

(3) 

(Al) the words of length k from Cyl[ (they coincide with the words of size 1 x k from D^) are the 
vertices; 

(A2) an edge u — > v exists if and only if the word " of size 2 x k belongs to D^; such an edge is labeled 
by v; 

(A3) each vertex is both initial and terminal. 

Note that srf is an unambiguous nondeterministic automaton recognizing as a language over A k . 
The index of (and the growth rate of over A*) equals a(Di < ) k . The underlying graph of srf is 
undirected due to vertical symmetry of the avoided factors. Let P u (n) be the number of walks of length 
n in si ' , starting at the vertex u, P(n) = Y<Pu{ n ) be the number of all walks of length n in srf '. Then 
P(n)=C(n+l,k). 

(3) 

For the language CyU , we build the Rauzy graph & of order k+l. The vertices of this graph are the 
words of Cylj; 3 - of length k+ 1 , and a directed edge connects a vertex u to v if and only if some word of 

Cyljt of length k+2 has the prefix u and the suffix v. It is easy to see that the edges of 3i can be labeled 
such that 3% becames a deterministic cover automaton (all transitions are deterministic, all vertices are 
both initial and terminal), recognizing the language Cyl[ 3 ^. Deterministic cover automaton is a special 

(3) 

case of unambiguous nondeterministic automaton; so, the index of ffl equals a(Cy\ k )• Now consider 
the &th power ffl k of 3%. Note that in most cases the correctness of transition from some vertex u of 3$ k 
to some other vertex v can be checked using only k last symbols of u. The only exception is the case 
when the ^-letter suffix of u begins and ends with X : if u begins with I , then the ^-letter suffix of v can 
begin with both I and / , while if u begins with \ , then this suffix of v must begin with I to prevent 
the appearance of the avoided 2x2 factor. Let us require v to begin with I in any case and consider the 
automaton 38 such that 

(B 1) the words of length k from Cyl[ 3 ^ (the suffixes of length k of the vertices from 3$ k ) are the vertices; 

(B2) an edge u — > v exists if and only if (a) the automaton 8% k contains the edge au — > bv for some 
a,b £ A, and (b) if u has the form X • • • X , then v begins with I ; such an edge is labeled by v; 

(B3) each vertex is both initial and terminal. 

We will write P' u { n ) for the number of walks of length n in £$, starting at u, and P'(n) = Y,K( n ) f° r 
the number of all walks of length n in 3$. If we denote the number of words of length nk in the language 
Cylf } by C'(n,k), then it is easy to see that P'(n) < C'(n+l,k) < C(n+l,k). 

4 Main result 

Since the indices of automata depend only on their adjacency matrices, below we consider the automata 
ff/ and 38 just as digraphs. Recall that they share the same set of vertices and any edge of 38 is contained 
in si/. The outdegrees of a vertex u in and 38 are denoted respectively by deg^(«) and deg^(w). 
We say that the vertices u and v are similar if they coincide up to the first 1 1 letters. Similarity is an 
equivalence relation; we write u ~ v. 
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Remark 1. The classes of ~ are finite, since the cardinality of such a class is the number of words of 
length 1 1 over A that can be extended by the same suffix. The maximum cardinality of such a class is 
N = 28 independently ofk, and is achieved on any suffix that begins with / . 

The following two key lemmas hold for any k > 12 (this restriction is necessary only for the existence 
of 12th symbol in the label of the vertex). 

Lemma 1. For any vertex u = u\ . . . Uk and any a 6 A such that either a / I or uyi ^ I , there exists an 
edge m — > x in SS such that the 12th letter ofx is a. 

Proof. Let x = x\ ■ ■ -x^. We first show that if the condition of the lemma holds for some z'th letter 
(1 < i < k) then it also holds for any jth letter (/ < j < k). It suffices to check the case 7 = 1 + 1. 

(3) 

Indeed, the minimal structures avoided by the words from CylJ; are either factors of length 2, or the 

"vertical factor" j of height 2, or the "square factor" y of size 2x2. Thus, the possible values of 
Xi + \ are determined by w,, and xf, each of these values together with Uj + \ and Uj + 2 determine the 
possible values of x\ + 2, and so on. There are only four possibilities for the factor w,m, + i. For each of 
them, we show that if the symbol x,- can take all possible values, then the same is true for jcj+i, see Fig. |2] 
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Figure 2: Proving Lemma QJ If jc e - can take any value, can take any value as well. 



In order to prove the lemma we find, for each vertex u, the number i u such that the /„th letter of x can 
take any value required by the condition of the lemma. If i u < 12 for any u, then we are done with the 
proof. So we examine all possible beginnings of u and try to build the word x\ ■ ■ ■ Xj u such that Xi u = a for 
any allowed a £ A. Recall that the letter x\ follows Uk in some cylinder word and hence, depends on Uk- 
In order to avoid the consideration of Uk (the restrictions involving Uk depend on k), we build the word 
x\ ■ ■ -Xi u for any x\ € A. The word x\ • ■ -Xj u for all u that begin with I and / is shown in Fig.[3](cases 1-3 
and 4-1 1, respectively). The maximum value of i u , namely 1 1, is achieved in case 9. If u begins with \, 
then its factor U2 ■ ■ ■ Uj u falls into one of the cases 1-11, so, we conclude that i u < 12. □ 

Lemma Q] is used to prove another property of similarity. 
Lemma 2. Ifu^v and u—^xis an edge in g/, then there exists an edge v — > y in SS such that x ~ y. 

Proof. Let u = u\ ■ ■ ■ u^, x = x\ •■ -Xk, v = vi • • • Vfc, and we have to find the vertex y = yi ■ ■ -yk- Assume 
that we know only the letters u\2, ■■■ ,Uk, and x\2- Then we still can restore all possible values of the 
factor X13 - --Xk independently of the letters u\,...,u\\,x\,...,x\\ (cf. the proof of LemmaQ]). 

Now consider all y's such that v — > y is an edge in S$ and yi2 = xn. The set of all such /s is 
nonempty by LemmaQ] Since V12 • • • = u\% ■ ■ • Uk by similarity of u and v, the set of all possible values 
of the factor yu - -- yk coincides with such a set for the factor X13 • • - Xk- Thus, we can pick up y so that the 
factor y\2 ■■■yk equals x\2 • • -x^ for the actual value of x. Then x ~ y, and the lemma is proved. □ 

Theorem 2. The limit lim^oo CC(T^ ) exists and is equal to a(D). 
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Figure 3: Proving Lemma\l] Cases 1-11 represent different beginnings of the word u. Under each beginning, 
some possible beginnings of the word x are drawn. For each possible first letter of x, we exhibit such begin- 
nings ending by all possible letters. In some cases, not all possible beginnings of x are drawn; for such missing 
beginnings, case 3 refers to case 2, case 6 to case 5, cases 8 and 9 to case 7, and case 11 to case 1 0. 
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Proof. Recall that a(T^) = a(Cylj 3 ' 1 ). Since the sequence (C Cy| ( 3) (n)) 1 /" converges to a(Cyl[ 3 ^), 

so does any its subsequence. Hence, a(Cylj 3 ' > ) = lim„^c» (C'(n,k)) l / nk . On the other hand, we 
know that a{D) = lim^oo Oc(D^) = lim&^oo lim, Woo (C(n,k)Y/ nk . Thus, let us estimate the ratio 
{C'(n,k)/C{n,k))fl nk . The upper bound (C{n,k)/C{n,k))) l / nk < 1 is trivial. In order to get the 
lower bound, we recall that C(n+l,k) = P(n) and C'(n+l,k) > P'(n). 

Let us fix an arbitrary vertex u and consider the si -tree (for u) defined as follows. The vertices of 
this tree are labeled by the vertices of sf, u being the label of the root. Any vertex labeled by v has 
deg^(v) children; the children are labeled by all forward neighbours of v in si '. Thus, there is a natural 
bijection between the set of vertices of level n in the si -tree and the set of all walks from u of length n in 
the automaton si '. That is, nth level of the sZ-tree. contains exactly P u {n) vertices. The £$-tree is defined 
in the same way, using 38 instead of si '. The nth level of the ^-tree contains P' u {n) vertices. 

Using Lemma |2] inductively, we get that the label of any vertex of nth level in the si -tree is similar to 
the label of some vertex of nth level in the ^-tree. Let us start from the roots of the trees and inductively 
construct a total map pt from the sZ-tree. to the ^-tree satisfying the following conditions: 

(1) if s is a level n vertex labeled by x, then pt{s) is a level n vertex labeled by some y ~ x; 

(2) ji (parent (s) )= parent (fl(s)). 

The existence of such a map is ensured by Lemma |2] and the structure of trees. 

Now we take a level n vertex t from the .^-tree and estimate the size of the set n~ l (t). Assume 
that |jU _1 (parent(f))| = K. If s is mapped to t, then parent^) G jU -1 (parent (t)). All children of the 
vertex parent(s) are different. Hence, by Remark [T] at most N of these children can be mapped to t. 
Thus, < KN. The case n = gives us \jl 1 (t)\ = 1 whence we obtain |ju 1 (t)\ < N". Since 

jU is total, we have P u (n) < N n P' u {n). Summing up these inequalities for all vertices u, we finally get 
P(n) <N n P'{n). 

Returning to combinatorial complexities, we can write 

1 < P'(n) ^ C'(n+l,k) < 
N" ~ P(n) ~ C(n+\,k) ~ ' 

l/{n+l)k < /c>+i^) y/(" +1 ^ < 



( C'(n+l,k) 
N^J ~ \ C(n+\,k) 



Taking the limits of all sides as n — > °°, we get 



iV A /«(rf 



< v K ' < i. 

Nj ~ a{D k ) ~ 

(3) 

Now we let k — > oo and use the squeese theorem to conclude that the limit lim^oo oc(T k ) /a(D k ) exists 
and is equal to 1 (recall that ,/V is independent of k). Since the limit lim^oo CC(Dk) = a(D) also exists, 
we have 

fe _^~ v «•/ k _^ o ct(p k ^ t^oo a(D k ) ~" v k- 
as desired. □ 



a(D) = a{D) ■ 1 = lim a(D k )- lim f = lim f ■ a(D k ) = lim a(T k 



(3) 



Remark 2. From the proof of the above theorem it is clear that the actual value of the constant N such 
that P(n) ~ N n P'{n) is much smaller than 28. Computations show that N ~ 2. 1 19. Hence, the set D k of 
2-dimensional words of width k is not much bigger than the corresponding set Cyl[ 3 ' of cylindric words. 
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